WHU-BioNLP CHEMDNER System with Mixed Conditional Random Fields and Word Clustering
نویسندگان
چکیده
Our team participated in the Chemical Compound and Drug Name Recognition task of BioCreative IV. We used a mixed conditional random fields with word clustering to fulfillment this task. For one hand, we generate the word feature by word clustering and train the corpus with word feature to get one model. On the other hand, the training corpus is transformed to a new one in the reversed order of the letters. We also train the reversed corpus to get another model. At the end, we mixed the two kinds of models to achieve a final F-measure of 86.07%.
منابع مشابه
CHEMDNER system with mixed conditional random fields and multi-scale word clustering
BACKGROUND The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary. METHODS We developed a CHEMDNER system based on mixed conditional r...
متن کاملA comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature
BACKGROUND Chemical compounds and drugs (together called chemical entities) embedded in scientific articles are crucial for many information extraction tasks in the biomedical domain. However, only a very limited number of chemical entity recognition systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of chemical entity r...
متن کاملIncorporating domain knowledge in chemical and biomedical named entity recognition with word representations
BACKGROUND Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning m...
متن کاملExtracting Gene Regulation Networks Using Linear-Chain Conditional Random Fields and Rules
Published literature in molecular genetics may collectively provide much information on gene regulation networks. Dedicated computational approaches are required to sip through large volumes of text and infer gene interactions. We propose a novel sieve-based relation extraction system that uses linear-chain conditional random fields and rules. Also, we introduce a new skip-mention data represen...
متن کاملPOSBIOTM-NER in the Shared Task of BioNLP/NLPBA2004
Two classifiers -Support Vector Machine (SVM) and Conditional Random Fields (CRFs) are applied here for the recognition of biomedical named entities. According to their different characteristics, the results of two classifiers are merged to achieve better performance. We propose an automatic corpus expansion method for SVM and CRF to overcome the shortage of the annotated training data. In addi...
متن کامل